irOCOBBn BBSQHB 



\ 



.ID 1S2 «20 

&OSHOB 

IHSXZTQSIOH 

• Simons KGBUci 
. POB DATE 
f MbiB 

' BDBS.^ PRICE " 
DSSCSXBZOBS 



PS 00$ B42 



ISBSTIFIBBS 



Hottsoir Burliest B«; Hutchins^ Elizabeth J« J 
Issnes Baised by the .Follov Through^ Evaluation^ 
SBIC Clearinghouse on Barly Childhood .Education, 
drbana# 111. « 

Hationdl Inst« of ]6ducation (DBE8) HashingtcDt D«C«d 
77 

HP-$0.83 HC^$1*67 51us Postage. 

♦Deaonstration .Prograas; ^'Barlj Childhood Education; 
Bducational Accountability; Educational Assessaent; 
Evaluation Methods; Federai Prpgraas; GovexiMent 
BQle; Priaaty Education; Prograa Effectiveness; 
^r^graa Evaluation; "^Prograa ValidSition; ^Besearch 
Design; ^Besearch Probleas*; Statistical Bias 
wro ject Follow Through * ' 



Thiis paper presents^ a discussion of issues <raised in 
the evaluation of Project Follov Through reported by >bt Associates. 
The pape^ suggests t^at aany of the probleas inherent in the design 
of Mbpt'h the i^rograa dnd the evaluation^ stea froa the underlying • 
assumption that one^ educational aodel^'could be found tiiicii vctild best 
alleviate th'e educational probleas of the' pocr« The paper suggests 
.that even Whem the original evaluation design was- acdifie^d, . . 
substantial ptobleis reaained« The aajdr issue^ and probleas ' 
discussed in the paper include: (1) the belief in the lexistence. of a 
JiesH prcgraa; |2) the problea of relying on test scbresi; 0) the 
issue of prcgraa staff knoaing the content of evaluiattipn in^truaents 
and teaching to the test; (4) probleas involvied in. designing or 
choosing^ valid instruaents; (5) the exist^tfte of l^ge intersite 
variations vithJLn the ^aae aodels; (6) probleas involved in. 
ifpleaenting a particular aodel in varying sites; (7) statistical 
probleas, particularly in the use. of the analysis of dcvariance and 
the use \f individual rather 'than class scores in the present . 
evaluation; (8) probleas involved in large scale ezperiaents;, (9Kthe 
fairness of the evaluation in terjas of original intentions and"^ later 
changes.; *(10> p^ess coverage vhich tei^ded to distort evaluation 
results, especially the invalid assuaption that the basic skills 
prcgraa s «fere the aost effective; and, (11) general guestions of 
governaent policies vhich shaped the evaluation procedures and led to 
aany of the subsequent p£ot)leas. (BD) 
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^ In April o,f 1977, Abt Associates, Inc. (AAI) released the lon'g- / 
awaited evaluation of the Follow Through program. The AAI evaluation 
compared thirteen models of ^early childhood e^ug^tion, ranging fromjiighly 
sClructured to open education appjroaches. The news media Seized upon t^he. 
findings as evidence that models , labeled "basic skills" succeeded better 
than t'hose labeled "cognit;Lve" or "affective." The evaluation itself was . • 

. ' • \ ' ' r y ■ ■ 

strongly criticized by a panel of evaluation experts (House, Glass, McLean, 
and Walke^, 1977). The evaluation is a porcupine of issuer, some of which 

• - ' ' ' \ ' • ' •" ■ . ; ' 

are ^discussed in this chapter. \ -f^ ' - 

' . • i ' . . ■ ' r 

Follow Through began in 1967 as a service program to continue the 
education of disadvantaged children, particularly those that had" attended 
Hea4 Start. It quickly, ran into funding difficulties When the expected. » 
$JL20 million was reduced to only $15 ^^million for the.jWrst year. Officials 
inside the federal bureaucracy decided to convert Follow Through into a 
"planned variation" experiment. That th6 government would support 
several types of early childhood model^ and eventually evaluate tliem 
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. to see^^hich worked best. This plan would enable Follow Through to ^ * 

cfont inue. • ^ 

Follow Through in its earliest planning stages was thouglht to be 
a program that could address change within institutions involving comiuuni-: 
ties and families as well as schools. However, when Follow Through was ^ 
desigi;ied as a planned^ variation experiment, the focus , became less that of 
changing social ^insti tut ions and more thatr of ^finding effective tfechniques' ^ 
of educating poor children iu the es^isting school .institufefcmw^^rogram ^ 
planners chose to* limit the program to finding techniques of. schooling that 
would better than traditional ]^i?actices. In this way,C^he social \ 

servicq aspect of Follo\7 Through was de-emphasized not only by the narrowuea$ 
of the evaluation but also .by 'the planners whi chose the planned variation \ 

J • • : ' \ 

t» design. — - - ' 

— > ' — . — 

The question ^to be answered by the evaiuirfitmri^7as--iirf^ worked best" 
or "what 'Worked most efficiently," as opposed to questions such, as "how does 
it work" or "how can we make it vork better," The history of the evaluation 
can be traced in excellent works by Haney (1977) and Elmore (1976). The 
policy which produced the evaluation has been analyzed by McLaughlin (1975) 



la^ 

and House. (1978). ^ . ^ 

The entire' Follow Through program was begun in a social milieu 



of "can do," At that time \n the sixties mc^t educational reforms su^bscrl^ed 
to the "big bang" theory of refo^. They believed it^Vas possible to 
discover a technique qf a pi;ogram th^t would "solve" a "problem" such as * • 
poJr students failing to achieve in school. Not only could such a technique 



ERLC 



be found but with some effort it could be disseminated all over the country, 

thus solving the social problem. Hence, the solution would be relatively 

cheap as well as affective,.* . ^ ^ • 

Given, such a belief , ,ft (became the mission of the federal government 

to *discover techniques and disseminate thjem. ^irst, the government had to 

* ♦ 

find out wha£ worked. Thus the strategy, clearly enunciated in the^ White 
House Conference on Education in 1967 ,(Blmo3:e, 1^76) was a indtter of 
rectifying educators' ignorance. All would be well when the successjEul 
program was found. ^ . . • 

« 

The reformers ran into difficulties, however. Early evaluation 



results from Head Start and Title I, ESEA, indicated that the new reform 
programs were tinsuccessf ul in raising the stand;a^dized test scores of the 



children involved. Federal officials interprejled this failure as .ipadecfuate 
variation and control oVer the programs. They concluded tha^t efforts should 
be devoted to developing different programs and then systemmatically . 
evaluating them. Hence, "planned variation," rathe't: than natural variation, 
became a reform strategy. Follow Through was the first attempt at planned 
vari,ation. ^ ' , - 

Sponsors, those developing the new models of early childhood 
Education, and sites, school districts implemewtdlng the «ew models, were 
chosen by the Office of Education. .Both sponsors and sites receive^ 
funding from the federal government.* At a meetipg in Kansas City in*l968, ^ 
sponsors and ^sites were matched to each other. Both developiilent an^ imple- 
mentatdon of the models, which were directed at poof cljildrer^ ftom kinder- 
•gatteri. through third grade, began' immediately. > * ' ♦ ^ 



/ From the Viewpoint of the federal officials, particularly those 
within the Office of the Assistant Secretary for Planning, and Evaluation 
inside the Department of Health, Education, and Welfare who had been T 
"instrumental in the planned variation conversion, evaluation was a critical 
part of the Follow Through^ plan. . It would tell which model worked best knd 
at what cost (Rl^vlin, 19?^). Also, the federal planner^ had a' particular 

• ' *' , " * 

idea of what evaluation should be — a massdve, controlled experiment,* It 
was a popular view of evaluation ht that time, though not one universally 
shared within the evaluation communi'^. 

^ . Consequep:tly, the evaluatiie^ was set pp a6 a large-scale experiment^ 

with comparison ^groups assigned for each of the Follow Through classes/ 
Comparisons would be made between the Follow "Through and non-Follow Through 
classes. A huge, multi-milliDn dollar contract was to^ the» *Stanf ord 
Research Institute (SRI) to conduct the evaluatioji. .SRI promised to evaluate 
all aspects of the program, including community involvement, institutional 
change, and so on. ^ . 

However, when the evaluation began, SRi! collected primarily standard-' 

ized test scores. This upset many of the sponsors, and t;hey protested 

t ' 
vociferously. SRI assured them that the less tangible goals of their 



models would be assessed in addition to the morp traditional outcomes as 
measured by standardized achievement tests. Inj fact, SRI began a serious 
effort to develop special measures appropriate to the expressed goals of 
the sponsor's models. • | 

^ Meanwhile, the political pressure was intense in Washington to 
expand the number of sponsors dnd sites. Special interest groups like 
^Idcks and bilinguals wanted their ,own sponsors. Political groups like the 



large cities wanted to become sites. TKe new, groups were accomodated. 
Sponsors and sites were added in art opportunistic fashion/ measurably 
increasing the political constitiiency and strength of the program in 
Congress* The Follow Through, budget began to gi^ow. 

' This caused pr^oblems elsewhere, however. . Exact comparison groups 

wejre difficult to find. Often controls were established that were very 

• • * 

unlike the Follow T?hrough classes. The program administrators were aware/ 

of these deviations from the evaluation pl^n, but they felt that, the new 

models would be so much more effective than ^ what the public schools were • 

dodng, and the gains in test scores would be so di^amatic, that it^ould not 

'matter whether the comparison classes ''•w^&re closely matched to the Follow 

Through classes. 

•Follow Through grew larger and larger. At its zenith there were 

more than twenty sponsors operating in over one hundred eighty sites. 

Hundreds of .thousands of children were involved^ SRI tried to collect 

« 

data on most of them but the logistics of data collection and the costs 
boundedout of control. "Furthermore, SRI was unable to develop the new 

■ ■ \ ~ ■ 

instruments it had promised. Amidst investigation by the Genofral Accounting ' 
Office, HEl^/ and Nader's Raiders, the evaluation became a scandal. Finally^ 
in 197^ > fhelfollow Through administrators in^ the Office of Education 
resigned and the ^valuation was reshaped. BRI had spent $12 million bn the . 
evaluation in the program's first four years. 

Under the direction of the Ofifice of Education ^he evaluation was'* 
spared -down to seventeen^ spoilsors working with eightjtlsites. The analytic 
sample contained jonly twenty- thouj^and children. More importantly, the nuraher ^ 
of instruments to collect data Was narrowed .to only four standardized jn/asures 
SRI continued to collect the data but the data analysis was contracted to. • 



Abt Associate^.. In all, the broad scope' of the evaluation was drastically 
narrowed in wUt Haney (1977X called a "tunneling" effect /''The early 
Childhood models would now be^ compared. on only a few standardized tests 



% I • ^ " ^ ^ • - — 

to determine Whictuwas "best.*"' - , ' 

• * • H • ^ ' : ' . . . ' ' . 

Throughout the course of the evaliiatioi^ the ^sponsors, parents of, 
the chi;Ldren, knd site personnel were by no^^means' silent jjxjtheit objfe^:tion 

. L ' * ^ : \ , • ^ 

to events. Jtoit continued to coinpla^.n, 'o/ten bitterly, 'about the evaluation, 
feariiCg their models aud their cnildren vjould not be Assessed by approprisrte 
criteria. Many never accepter? the conversion of thfe entire Follow Through 
program to an experiment. They saw Follow Throjlgh as a program providing' 

social services to children and their families. Sponsors primarily saw it 

1 ^ • ' * ' ' ' ' > ^ . 

as a development program'. ^ \ ' [ . ^ ■ ' • s . 

1 - ^ ^ ^ . • ' 

Faced with the problem of ^analyzing the test dat^-from nonequivalent 

* - • ^ * * ' ^ / 

(and often mismatched) Follow ^Through and comparison classes^M^he Abt 

Associates analysts resorted to a complex statistical analysis to try to 

correct the problem ds best they could. The technique chosen, analysis 

of covaxiance, adjusts the final^ test scord^. of ^children in. such a way that - 

their te^^scores are made mork^e^oivalent based upon tlie achievement^ test 

scores of the students, the^indome level of the parents, and other variables 

recordfed at tlie begi|ining of school. Presumably,, after' tfie statistjical 

adjus'tment the scores of the two^ classes will be mpre like th^ scores of ' 

two ' properly .iftatched classes. ^ . ' • . <. 

UnfortilnateJ,y, this^ statistical' technique' has proved^ to be much 

more unreliable in practice than it was feell-eved' to be in the l^ate sixties 

and early seventies ^^C^nbach, ^977; Campbell and Erl^ebacher, 1975]^. The 

actual test score ndjbstmi^nts are such that the error in the procedure is 



quite large. Jhe technique has now become controversial among statisti- 
cians. .The^ntire AAI evaluation of Follow Through is based upon it. 

X Another controversial aspect of the evaluation was whether to use 
individual student scores or class averages in the data.analys^is. The AAI 



analysis uses only ^individual student scores. It has .been demonstrated 
with the Follow Through data thatjjpne can obtain dramatically different, 
results using class ratlier than student s^cores.- Many leading authorities ^ 
say AAI should have used the class scoVes instead. This is kno\«i .as the 
"units of a'nalysi-s"' problem. • / , ' 

In spite^of these and other difficulties, Abt> Associates published 
its results in April, 197). Based on their controversial techniques, ^Jthey 
drew two types of conclus;ions. One conclusion was that the differences 
.in results from site to site i?^rC very great. In other word^, even within , 
the same model, e.g^. J)irect Instruction, many of the- sites did better in 
test ..results tli^n.the comparison classes but in at le^tst two or three 
Direct Instruction sites the results were* jnuch worse than in the comparison 
classes. This gi^at intei|site| variation held for all^tnodels* ' The results 
vari^ dr^raatically from site to site for every Jt)ne. , ' 

In fact, the intersite variation among 'models was so great that th^ 



AAI analysts* refused to sajll that any particular, model was best. Put in 

- .• '. ■ • ' f . ■ . 

another manner^ differences! between the sites within a given model were 
nearly as grWt as the differences "among the models* This was an embarrass- 
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ment for a study which had been predicated on the idea of identifying the 
"b^t" model. In fact, the 0|f f icfe of Education insiisted tHat AAI continue 
with the comparisons of modelb in spite. of AAl's^strong reservation about 



doing so. In one critique of 



the ^valuation, the finding of»^gT?eat intersite 



variation within models seemed^ to remain valid^' despite the many, flaws of 
the evaluation *(House, Glass, McLean/ and It/a Iker, 19 78) 

The second $et of AAI findings revolved around the classificdtion *of 
-the^eaxlyjchildhood models on the basis of the model-s^bals. AAI classi- 



fied models into tliree types: basic skills, cognitive/conceptual, ^nd 
affective/cognitive. AAI also classjLfied the outcome measures into basic 
skills, cognitive, and affective.. , This dual classification seemed to be 
' extremely .arbitrary and pefhaps mistaken, v , 

;The AAI analysts then matched the appropriate type of model with 
the corresponding outcome measure. In other words, one would e^ect the 
so-called "basic skills" models/ to do better on the bksic skiHs measures 
and so on. Thi.s gave-^^O^mblance of fairness .to the evaluation and dis- 
guised the fact that the evaluation primarily consisted of standardized 

* ' J' 
achievement tests, the traditional measures on which one migHt expect * 

' ' ' * ^> 

"basic skills" models, which emphasized^ rote learning skills found on such 



tests, to do better. 



The AAI analysts* found that the "basic sk^ls" models -did better 
on both basic skills outcomes and on the affective measures. The "cognitive" 
and "affective" models did Ijetter on none of the measures. All this> .of 
course, was within the context of powerful intersite variation, which is to 
say that any giv^n site from a particular "affective" model might do 
extremely well on all the measures. For example, although AAI ranked the 
Bank Street model in the middle in terms of effects., one of its sites was, 
among the best. * . ' 

9 . ^ . 

. ' ■ r / •■ ■ - ' • ■ ■ - 



In the milieu of the tilnes, the AAI. finding that "basic skills", 
models ^re better was seized vpon.by the ipass media and the finding of 
-.intersit^ variation was Virtually ignored. Articles Were ^c^^r^ied in the 
New York Times » Wall Street "Journal , Washington P6s£ , Newsweek, and most of ^ 
the- major newspapers in the country.^ The newsipapers were nbt/careful in 
fchfeir coverage of the. findings, simplifying th^ Results considerably. Even 
the AAI analysts were .moved, to protest the distorted coverajge the Boston , 

: , ;^ . ^ ' : : , 

newspapers. ' . ' 

Perhaps the mbst widely ciri^lated report was that of the. conservative 
syndicated cblurahist, James J. Kilpatrick. In a column «that can only be 
labeled a parody of ^ the AAI report, Kilpatrick wondered why It had -taken^ 
the educators so much time and money to discover the obvious^^out^ schooling. ^ 
His view of t}ie ^'basic skills" models w^,s no closfer to reality than his 
description of the "affective" luodels. His column was'wid^ly circulated ^ * 
across the country under various headlines supplied by local newspapers ^ 



•including "Basics Beats Funsies in School^" "A Nation of Illiterates and 
VBasic Educat;ion Offers Alternatives to Numbskulls." \ , \^ 

In response to\protests about the evaluation by sponsors and others, 
the Ford Fo^tindation. funded a third party critique of the evaluation by a ^ 
panels of evaluation^^^xpe^ts."y The panel found the intersite variation 
finding substantially ^correct^Howeyer, the findings coynparing the models 

7 ' 



were invalid. The critique* assented that the evaluation, contained a series 



of errors and inadequacies. : — — 

' For example,' the narrowness of the scope^f measurement and Its 
bi^s towards certain models precluded statements about which models were 
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besty .Some of the Instruments had questionable reliability, and the 
classification of both the models and measures was ?iiisleading. Further- 
more, the evaluation contained two substantial statistical flaws • liJhen 

r 

these flaws were corrected, no models br model types proved to be bette 
even- giy^n the traditional measures used as outcome data*' Nonetheless, 
-Xhe finding of intersite variation held up under the reanalysis of the 
critique (For the full yersion of the critique, s^e House, Glass, McLean, 

r - . 

and^JValker, 1978)^ " ~ 

A t\umber of major issues^ were raised by^jAe Follow Through evaluation 

The Big Bang Theory * The Idea^that one can invent a model prbgram that will 

• . ■ .. / V ^ 

"solve" the problems of disadvantaged children across th^ countrj«^was a 




strong elemeyit in tde^ Hollow Through program* That belief now seems to be 
dissipating slowly bj^t' steadily* Tbe originators of the program,, and 
possibly the sponsors, thought they could invent educational treatments that 
would be far superior to the schooling 4:raditlonally offered by the public 
schools •> Such dramatic gains were not forthcoming. ' 

It may be that* any gains weje severely masked by the narrowness 
of ^ the traditional tests used to measure outcomes of the early childhood 
ihodels*w As a whoje, though, the data did not show dramatic results* In 

the AAI evaluation- the Follow Through models as a whole did no better than 

« , ■« • 

did the public school classes to which they were' compared J It should be 

noted that the compfarison classes themselves were often classes enriched 

* • ' • ^ 

by Title I and other compensatory programs*^ • . , ^ 



l^aith iit Testing * The faith in testing was stropg. Government planners 
nevqr wavered from the view that gains in scores on staYidardized achievement 
testis :were the improvements they wanted, whatever else they got. To them . 
the gains in test scores were surrogate ^measures for improved chances later 
in life. They insisted that^ test scores be the focus of e^n^ evaluation. \ ^ 
' l-lhen the sponsors protested that traditional testis were too narrow ^ 
^ to measure their pro-am out^comes, St-aaford Research Jfi^titute e^ressed 
confidence that they could develop new, measures to coverti these new o^utcomes. 
They tried but failed miserably. At the end SRI questioned their faith, ii> 
their own ability to develop such, measures and cautioned other test 
developers. , - • . ,^ - . ' 

.Most of the sponsors felt that the tests tj^ete. invalid for^thelr 



\ 



models and protested vociferously their exclusive use. * Yet they^ persisted 
in the program, hoping agaii\st hope, that in spite of :poo^ tests, thefr o\m 

4 

models would show up well on them. Faith dn their ox^/n/fnodfels led thevf 
to believe they would ^o well on tfiet tests. 

Teaching ^tq the Test . A familiar issue .was raised by the evalyation— teaching 
to^ the test. Jt is clear thx, teaching tl^^ exact items on a test *is an 
illegitimate activity— unless the *iteros taught comprise the universe of 
thi^ngs to be learned. Most tests, sucl\ as standardized achievement^ ^tests, 
only sample 'the domain of .leargtLng that is being assessed. Teachirig the 
items invalidates the inference that the student knows, t^e domain of knowledge 
the test is^ sampling. ^ - '9 

Other than teaching the^ items, there are a number of things one might, 
do to prepare for .the test, however. «*lt is aoout the legitimacy pf those 

. . • ^ 12 ' , 



se 
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actjLvitles that people ' disagree. * At thfe beginning of Follow through, it 
, was certain that an achievement test would be an important * component , of the 
evaluation* Reportedly, one sponsor^said, •'We don^t care what the^t^st'is. 
Just tell us what it is and we'll teach to it" (Egbert, 1971). Other - " 
spQnso'ts tliought this was an infringement, ' » ' ^ 1 . / . 

^ . No matter what various sponsors did once they knev> Che test was ^ 
the Metropolitan Achievement ^eat (MAT), a test readily available for 
inspection, it was tru5^^:Jiatppon3ors did best when their curriculum- 



f 

' materials come closest to the specific sub testes. For example, the strongest 
performance on any s^b^^st was turned in by- Direct Instruction childifen on 
the ''language" subtest of the MAT. The language subtest consists of a 
Section in which l^he students discriminate between incomplete sentences and ^ 
"irelling" and "asking" sentences. The other Section calls' for identifying . 
.errors in capitalization, punctuation, atid usage, 

* It was on this subtest that Direct Instruction turned' in by far ^ 

^--*^*^the strongest p er^'riaancfe\xf any sponsor on any test* In fact, this high 
score accounts for much of Direct Instruction ^ effectiveness on basic ^ 
skills since the language subtest was Included in basic skills mea&ures. ' 4. 
A comparison of the subtest of the JIAT with the third grade lessons in ^ 
Distar , ^ the commercial versioj of Direct^ Instruction, shows a close 
similarity between the two. The format and instruction, though not specific 
items, were the same. Distar children are repeatedly drilled on content 

. ^ . *' V * " , * 

. similar to the test. Is this teaching to the test? Different people give 
different answers. Other sponsors may have geared some of their materials . 
^3jPtowards the test too. We point out only one example; ' 
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Measurement Problems , The evaluatprs' were unable to come up with anything 
resembling a satisfactory instrument tQ measure. the Tess tangible outcomes 
of the models* The' two affective instiouments had serious deficiencies*' • 
Other studies were attempted, such as assessing the impact of the tiodels on 
the comiQunities. In which they were^ implemented. These were often dismissed 
for lack of sufficient reliability. Information was ultimately limited to 

- ^ . / • ' • ' ; - 

students filling out pencil and paper tests. The interviews with parents 

m » • 

andjquestionnaires to teachers were not fully treated in the final evaluation 
r^ort. . The focus of the evaluation was^^xce^tionally narrow. 

Iitersite Variation . Even within the traditional tests employed. there was 
enormous variation in outcomes betweeric^ites within the same models. . This 
was the one' cQjastant finding of the study. Lorcal circumstances — parents, 
teachers, peers, school environment, home environment — seemed to have a great 
effect on the classroom outcomes. Even wher>e the 'early childhood 9iodels 

r 

had their greatest effect on test achievement,, their influence was very 
modest, less than ten percent of the overall variation-.in test scores. 
Local conditions had a far stronger impact.* This^ finding raises questions 
^bout both the efficacy art^^ype of federal intervention in local diatrifcts. 



s. 



According to this study, government programs are very limited ia- their power 
to affect traditional outcomes like test scores. N 

Impl emen tat ion . Uncertainty has always existed as to how fully the early 

* - ^ - ' . . „ • ^ ' ^ ^ r- 

childhood models w^re implemented in tfhe school sites. If any situation Js . 

* >^ ^ 

advantageous for model implementatjlon, certainly Follow Through should have 

N 

been. The sites volunteered to work^with the sponsors. Both sites and • 
sponsors vere paid a substantial sum of money to implement the mo^dels. 
Sponsors worked with a limited number of sites,, usually. less than ten, over^ 
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a Iqrig period of time, most 'for many years/ Several cohprts of children 
wenC through the models. 

Nonetheless,^ th?re is evidence that the implementation was not 
perfect. Early observation studies found differences in implementation 
across classrooms within models (Stallings, 1975) Furthermore, some 
sponsors no doubt .implemented their models more succes^sfully than did other 
sponsors. , . ^ . \, * • 



gtatisUcal Problems . The evaluation raised highly technical but important 

issues about the statistical analysis employed. The study demonstrates 
♦ • ' . . ^- 

the limits of analysis of covariance techniques, used in this case beyond 
their capabilities. » The implications are that certain types of studies, 
such as those wi%iNi;ion-equivalent control groups, should not be conducted 
since the statistical corrections cannot be made reliably. A great number 
of studies are of this type. * ' • . 

Second, the evaluation demonstrates withoiit doubt that the* unit, 

of anajy^is' employed— the individual stydent, the class, or the school 

♦ 

district — has a dramatic effect on the Results. The. selection of .the unit 
becomes a major problem in the design of most studies. In most cases the 
classroom i s probably the appropriate unit of analysis. 



Large Sgale Experiments . , This evaluation throws into question the utility 
of all large scale experiments. The costs were exorbitant. The evaluation 
alone cost nearly $50 million. ^ The informati<5n gained was not worth the 
"TOSt. The ide^^lKiL uiiti ca n d^ f I n l tj I vely-Hp^^ rTm'n n ^>1r>_^nct>T^lb f q m rl lor 



questions, such as which is the best nfodel of early cWi,ldhood education is. 
dubious. A series of small studies thnt contribute to developing knowledge 
over a period of time is more informative than#a massive one. The expectation 



that such a large experiment will resolve major problems is unrealis- 



Fairness. The sponsors were promised early in the evaluation that bbe less 
tangible outcomes of their model^ would be measured. The evaluation was 
unafile to deliver on tliis promise. By^the tiine ,the true nature of the 
evaluation became clear, the sponsors were Heavily invested and entrenched 
in sites. Thus the evaluation agreement was unf^d^to the sponsors. Much 
of the sponsor's sense of moral outrage can be attributed to a feeling of 
being treated unfairly. Evaluators should not make promise^^^hey cannot 
keep. At the very least they should renegotiate the understanding between 
themselves and those being evaluated. 

■ ■• ■ . • • • . ' ^, 

Press Coverage . The interpretation of the Follow Tlirough evaluation was 
significantly affectid, even distorted, by the mass media. lii fact, most 
people's perceptions, even professionals, w^re shaped by the press coverage 
rather than the attual ^tudy.- This is a serious problem* ^^hat the press* 
seems to do is to feed stereotypes that they think the public already has. 
This is the line of easiest and most succinct communication for them. 
Unfortunately, it also distorts tb^ messages conveyed. ^ ' 

• In this case, the press ^eizeS upon the ''basic skills" label and 
read their own meaning into it. Since the public was concerned about "back 
to basics," the Follow Through** evaluation was fodder for that particular 
movement. In the AAI study reading comprehension was not included as a 
•'basic skiy.s^" measure but^ as af* cognitive?' measure.* Few parents would want 

"basic skills" that exclude reading. Yet the press seized only the label 

^ -^ — '>> • 
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itself, a label suppli-ed by the AA.I analysts, 
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.The damage done 'by such a inisinterpretation is almost impossible 
to reverse/ The mass media are not interested in corrections . of yesterday % 
headlines., Jhe fault is not entirely the media's sinfce only a few people » 
could understand the AAI report as it was written — two, thousand pages of ^ . 
statistics ,ind tables. The media may seize on the simple, the familiar, 
ignore the complex in self-defense. But the potential for misinformation 
in the interaction between expertise and the mates Jti^dia is formidable* In 
the case of Follow Through it materialized. ^ 

Government Policy . Finally, one must qtJiestion the government policy that 
shaped the Follow Thr6t4gh evaluation. Policy b^sed on the "big bang", theory 
assumed there could be an invention or discovery of a model program that ^, 
would solve the problem of educating disadvantaged children in the early years 
o^ school. Further, it assumed that ' models could be successfully * . 
l^lemented and uniformly effective under any number of differing local! 
conditions. These assumptions seem considerably more dubious today than 
when Follow Through began. \ 

That is not to say that the federal government should not develop 
new programs. But one might expect the new programs to have differential 
effects in different settings. The same program: may be desirable in one 
place £md not in another, desirable even for one group in the same place 
and not for another. It doe^ mean tha^; the government should not propagate 
one "best" model to the exclusion of all others, since the effects may differ, 
if in fact they are even ascertainable. * * 

A second reform is necessary in the federal gover^nmeqt * s evaluation 

* ♦» » 

po^-icy. Federal evaluation policy for the last 'decade has been built on 

certain assumptions manifested in the Follow Through evaluation. It has 
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assumed th^t there is agreement on program goals and in 'the oubcome measures, " 
^ almost always test scores, on wl^ich, the programs! are to be assessed. It .'. 
val§o assume^ simple cause and effect relationships. ' . 

There are places when such assumption <are valid^ where suc^ 
approaches to evaluation will work,^ but the United States as a whole is tio.t 
one of them (House, 1978). In a society as pluralistic as fhe U.gi;) people . 
often disagree on goals for schooling. They certainly disagree on outcome 
measures by which to assess pro'grams. And the cause and effect relationships 
in the social sciences are ^exceedingly complex. Perhaps the final judgnent ' 
Is that the Follow Through evaluation was entirely too simple for the context 
in which it was employed. , ' ^ ^ / 



■\ 



The material in this publication' was prepared pursuant to a contract 
with- the National Institute of Education, U.S. Department of Health, 
. Education and Welfare. Contractors undertaking such projects under 
government sponsorship are' encouraged to express freely their judgement 
in .professional and technical matters. Prior to publicatix)h, the manu- 
, scfl[^t was submitted to the Area Committee for Early Childhood Educatioi;i 
\ at the UniJrersity- of Illinois for critical review and determination of 

professional competence. This publication has met such* standards. 
. ^ Points of view or opinions, however, do not necessarily represent the 
official view or opinions of either the- Area Committee or the Natfanal" 
Ins\:itute df Education, ' • ' ' ^ . ^ 
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